-
Notifications
You must be signed in to change notification settings - Fork 468
shared expert dp for deepseek r1 alltoall #3235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: zhaozx-cn <[email protected]>
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for shared expert data parallelism with all-to-all communication for DeepSeek v1 models. The changes primarily involve enabling and configuring sequence parallelism when shared expert DP is active. This is achieved by modifying the forward context, attention implementations, and various custom operators. My review identified a critical issue where an assertion was commented out, potentially hiding a bug related to tensor shapes in residual connections. Other changes appear to be consistent refactorings to support the new feature.
if residual is not None: | ||
residual = torch.ops.vllm.maybe_chunk_residual(x, residual) | ||
assert x.size(0) == residual.size(0) | ||
#assert x.size(0) == residual.size(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commenting out assertions is dangerous as it can hide underlying bugs. The assertion x.size(0) == residual.size(0)
was likely failing. Instead of disabling it, the root cause should be investigated and fixed. The preceding line residual = torch.ops.vllm.maybe_chunk_residual(x, residual)
is supposed to ensure tensor shapes are compatible for the residual connection. If this assertion fails, it indicates a problem with maybe_chunk_residual
or the tensors x
and residual
passed to it, especially in sequence parallelism scenarios where tensor shapes can be tricky. Please either fix the underlying issue and re-enable the assertion, or provide a detailed explanation for why this assertion is no longer valid.
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: zhaozx-cn <[email protected]>
What this PR does / why we need it?
Does this PR introduce any user-facing change?
How was this patch tested?
vllm-project/vllm@13dd93c